class: center, middle, inverse, title-slide .title[ # Usaldusvahemik ] .author[ ### Indrek Soidla ] .institute[ ### Tartu Ülikool ] .date[ ### 2023/09/23 (updated: 2023-10-04) ] --- ### Laeme andmed sisse, eraldame Eesti andmed ```r library(haven) library(tidyverse) r8 <- read_spss("data/ESS8e02_2.sav") ee8 <- r8 |> filter(cntry == "EE") |> select(lkredcc, region, pspwght) ``` --- ### Kontrollime tunnuse `lkredcc` jaotusparameetreid ```r library(summarytools) descr(ee8$lkredcc) ``` ``` ## Descriptive Statistics ## ee8$lkredcc ## Label: Imagine large numbers of people limit energy use, how likely reduce climate change ## N: 2019 ## ## lkredcc ## ----------------- --------- ## Mean 4.70 ## Std.Dev 2.43 ## Min 0.00 ## Q1 3.00 ## Median 5.00 ## Q3 6.00 ## Max 10.00 ## MAD 2.97 ## IQR 3.00 ## CV 0.52 ## Skewness -0.13 ## SE.Skewness 0.06 ## Kurtosis -0.54 ## N.Valid 1905.00 ## Pct.Valid 94.35 ``` --- ### Kodeerime tunnuse `region` ümber eestikeelseks ```r ee8$region <- dplyr::recode(as.factor(ee8$region), "EE001" = "Põhja-Eesti", "EE004" = "Lääne-Eesti", "EE006" = "Kesk-Eesti", "EE007" = "Kirde-Eesti", "EE008" = "Lõuna-Eesti") table(ee8$region) ``` ``` ## ## Põhja-Eesti Lääne-Eesti Kesk-Eesti Kirde-Eesti Lõuna-Eesti ## 824 236 213 209 537 ``` --- ### Arvutame paketi `meantables` abiga hinnangute keskmised ja usalduspiirid regiooniti ```r # install.packages("meantables") library(meantables) ee8 |> group_by(region) |> mean_table(lkredcc) ``` <table> <thead> <tr> <th style="text-align:left;"> response_var </th> <th style="text-align:left;"> group_var </th> <th style="text-align:left;"> group_cat </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sd </th> <th style="text-align:right;"> sem </th> <th style="text-align:right;"> lcl </th> <th style="text-align:right;"> ucl </th> <th style="text-align:right;"> min </th> <th style="text-align:right;"> max </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> lkredcc </td> <td style="text-align:left;"> region </td> <td style="text-align:left;"> Põhja-Eesti </td> <td style="text-align:right;"> 792 </td> <td style="text-align:right;"> 4.98 </td> <td style="text-align:right;"> 2.34 </td> <td style="text-align:right;"> 0.0830960 </td> <td style="text-align:right;"> 4.82 </td> <td style="text-align:right;"> 5.15 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> lkredcc </td> <td style="text-align:left;"> region </td> <td style="text-align:left;"> Lääne-Eesti </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 4.57 </td> <td style="text-align:right;"> 2.41 </td> <td style="text-align:right;"> 0.1582965 </td> <td style="text-align:right;"> 4.26 </td> <td style="text-align:right;"> 4.88 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> lkredcc </td> <td style="text-align:left;"> region </td> <td style="text-align:left;"> Kesk-Eesti </td> <td style="text-align:right;"> 181 </td> <td style="text-align:right;"> 4.53 </td> <td style="text-align:right;"> 2.67 </td> <td style="text-align:right;"> 0.1986049 </td> <td style="text-align:right;"> 4.14 </td> <td style="text-align:right;"> 4.92 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> lkredcc </td> <td style="text-align:left;"> region </td> <td style="text-align:left;"> Kirde-Eesti </td> <td style="text-align:right;"> 185 </td> <td style="text-align:right;"> 4.08 </td> <td style="text-align:right;"> 2.43 </td> <td style="text-align:right;"> 0.1784336 </td> <td style="text-align:right;"> 3.73 </td> <td style="text-align:right;"> 4.43 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> lkredcc </td> <td style="text-align:left;"> region </td> <td style="text-align:left;"> Lõuna-Eesti </td> <td style="text-align:right;"> 515 </td> <td style="text-align:right;"> 4.61 </td> <td style="text-align:right;"> 2.46 </td> <td style="text-align:right;"> 0.1082654 </td> <td style="text-align:right;"> 4.40 </td> <td style="text-align:right;"> 4.82 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- ### Punktdiagramm .pull-left[ ```r lk_95 <- ee8 |> group_by(region) |> mean_table(lkredcc) ``` ```r ggplot(lk_95, aes(x = group_cat, y = mean)) + geom_point() ``` ] .pull-right[ <!-- --> ] --- ### Vahemikdiagramm .pull-left[ ```r ggplot(lk_95, aes(x = group_cat, y = mean)) + geom_point() + geom_errorbar( aes(ymin = lcl, ymax = ucl), width = 0.1) ``` ] .pull-right[ <!-- --> ] --- ### Millest sõltub usaldusvahemiku laius? ```r ee8 |> group_by(region) |> mean_table(lkredcc) |> mutate(laius = ucl - lcl) |> select(group_cat, n, mean, sd, laius) ``` <table> <thead> <tr> <th style="text-align:left;"> group_cat </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sd </th> <th style="text-align:right;"> laius </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Põhja-Eesti </td> <td style="text-align:right;"> 792 </td> <td style="text-align:right;"> 4.98 </td> <td style="text-align:right;"> 2.34 </td> <td style="text-align:right;"> 0.33 </td> </tr> <tr> <td style="text-align:left;"> Lääne-Eesti </td> <td style="text-align:right;"> 232 </td> <td style="text-align:right;"> 4.57 </td> <td style="text-align:right;"> 2.41 </td> <td style="text-align:right;"> 0.62 </td> </tr> <tr> <td style="text-align:left;"> Kesk-Eesti </td> <td style="text-align:right;"> 181 </td> <td style="text-align:right;"> 4.53 </td> <td style="text-align:right;"> 2.67 </td> <td style="text-align:right;"> 0.78 </td> </tr> <tr> <td style="text-align:left;"> Kirde-Eesti </td> <td style="text-align:right;"> 185 </td> <td style="text-align:right;"> 4.08 </td> <td style="text-align:right;"> 2.43 </td> <td style="text-align:right;"> 0.70 </td> </tr> <tr> <td style="text-align:left;"> Lõuna-Eesti </td> <td style="text-align:right;"> 515 </td> <td style="text-align:right;"> 4.61 </td> <td style="text-align:right;"> 2.46 </td> <td style="text-align:right;"> 0.42 </td> </tr> </tbody> </table> --- ### Kuidas sõltub usaldusvahemiku laius usaldusnivoost? Arvutame usalduspiirid ka usaldusnivoodel 99% ja 90% ```r lk_99 <- ee8 |> group_by(region) |> mean_table(lkredcc, t_prob = 0.995) lk_90 <- ee8 |> group_by(region) |> mean_table(lkredcc, t_prob = 0.95) ``` --- ### Usaldusvahemikud eri usaldusnivoodel ```r ggplot(lk_95, aes(x = group_cat, y = mean)) + geom_point() + * geom_errorbar(data = lk_99, aes(ymin = lcl, ymax = ucl), * width = 0.05, colour = "blue") + geom_errorbar(aes(ymin = lcl, ymax = ucl), width = 0.1) + * geom_errorbar(data = lk_90, aes(ymin = lcl, ymax = ucl), * width = 0.15, colour = "red") ``` --- ### Usaldusvahemikud eri usaldusnivoodel <!-- --> --- ### Uurime lähemalt usaldusvahemikke usaldusnivool 95% .pull-left[ ```r ggplot(lk_95, aes(x = group_cat, y = mean)) + geom_point() + geom_errorbar( aes(ymin = lcl, ymax = ucl), width = 0.1) ``` ] .pull-right[ <!-- --> ] --- ### Harjutus 3 ```r ee8a <- r8 %>% filter(cntry == "EE") %>% select(gvsrdcc, edulvlb, pspwght) ee8a <- ee8a %>% mutate(har = case_when(edulvlb <= 213 ~ "Kuni põhiharidus", edulvlb == 313 ~ "Keskharidus", edulvlb > 213 & edulvlb < 600 ~ "Kutseharidus", edulvlb >= 600 ~ "Kõrgharidus")) library(forcats) ee8a$har <- fct_relevel(ee8a$har, "Kuni põhiharidus", "Kutseharidus", "Keskharidus", "Kõrgharidus") ``` --- ### Harjutus 3 .pull-left[ ```r gs_uv <- ee8a |> drop_na(har) |> group_by(har) |> mean_table(gvsrdcc) ``` ```r ggplot(gs_uv, aes(x = group_cat, y = mean)) + geom_point() + geom_errorbar( aes(ymin = lcl, ymax = ucl), width = 0.1 ) ``` ] .pull-right[ <!-- --> ] --- class: center, middle, inverse # Keskmiste vahe usalduspiirid --- ### Arvutame keskmiste vahede usalduspiirid ```r ee8 %>% filter(region == "Kirde-Eesti" | region == "Kesk-Eesti") %>% t.test(lkredcc ~ region, data = .) ``` ``` ## ## Welch Two Sample t-test ## ## data: lkredcc by region ## t = 1.6829, df = 359.03, p-value = 0.09327 ## alternative hypothesis: true difference in means between group Kesk-Eesti and group Kirde-Eesti is not equal to 0 ## 95 percent confidence interval: ## -0.07575068 0.97436200 ## sample estimates: ## mean in group Kesk-Eesti mean in group Kirde-Eesti ## 4.530387 4.081081 ``` --- ### Arvutame keskmiste vahede usalduspiirid ```r ee8 %>% filter(region == "Kirde-Eesti" | region == "Lääne-Eesti") %>% t.test(lkredcc ~ region, data = .) ``` ``` ## ## Welch Two Sample t-test ## ## data: lkredcc by region ## t = 2.0454, df = 393.47, p-value = 0.04148 ## alternative hypothesis: true difference in means between group Lääne-Eesti and group Kirde-Eesti is not equal to 0 ## 95 percent confidence interval: ## 0.01893274 0.95683613 ## sample estimates: ## mean in group Lääne-Eesti mean in group Kirde-Eesti ## 4.568966 4.081081 ``` --- ### Harjutus 4 ```r ee8a %>% filter(har == "Kuni põhiharidus" | har == "Kutseharidus") %>% t.test(gvsrdcc ~ har, data = .) ``` ``` ## ## Welch Two Sample t-test ## ## data: gvsrdcc by har ## t = -2.0146, df = 598.41, p-value = 0.04439 ## alternative hypothesis: true difference in means between group Kuni põhiharidus and group Kutseharidus is not equal to 0 ## 95 percent confidence interval: ## -0.612921963 -0.007803079 ## sample estimates: ## mean in group Kuni põhiharidus mean in group Kutseharidus ## 4.312704 4.623066 ``` --- ### Harjutus 4 ```r ee8a %>% filter(har == "Kuni põhiharidus" | har == "Keskharidus") %>% t.test(gvsrdcc ~ har, data = .) ``` ``` ## ## Welch Two Sample t-test ## ## data: gvsrdcc by har ## t = -2.4051, df = 654.21, p-value = 0.01644 ## alternative hypothesis: true difference in means between group Kuni põhiharidus and group Keskharidus is not equal to 0 ## 95 percent confidence interval: ## -0.76908464 -0.07773041 ## sample estimates: ## mean in group Kuni põhiharidus mean in group Keskharidus ## 4.312704 4.736111 ``` --- ### Harjutus 4 ```r ee8a %>% filter(har == "Kutseharidus" | har == "Keskharidus") %>% t.test(gvsrdcc ~ har, data = .) ``` ``` ## ## Welch Two Sample t-test ## ## data: gvsrdcc by har ## t = -0.75807, df = 722.09, p-value = 0.4487 ## alternative hypothesis: true difference in means between group Kutseharidus and group Keskharidus is not equal to 0 ## 95 percent confidence interval: ## -0.4058079 0.1797179 ## sample estimates: ## mean in group Kutseharidus mean in group Keskharidus ## 4.623066 4.736111 ``` --- ### Harjutus 4 ```r ee8a %>% filter(har == "Keskharidus" | har == "Kõrgharidus") %>% t.test(gvsrdcc ~ har, data = .) ``` ``` ## ## Welch Two Sample t-test ## ## data: gvsrdcc by har ## t = -2.4046, df = 712.09, p-value = 0.01645 ## alternative hypothesis: true difference in means between group Keskharidus and group Kõrgharidus is not equal to 0 ## 95 percent confidence interval: ## -0.65796689 -0.06647137 ## sample estimates: ## mean in group Keskharidus mean in group Kõrgharidus ## 4.736111 5.098330 ``` --- class: center, middle, inverse # Usaldusvahemike arvutamine kaaludega --- ### Usaldusvahemike arvutamine kaaludega - Eelnevad näited olid lihtsuse mõttes tehtud ilma kaaludeta, täpsemate tulemuste saamiseks oleks tarvis ESS-i andmete puhul kasutada ka järelkihistamiskaale - Selleks on tarvis paketi survey abi ```r # install.packages("survey") library(survey) ee8w <- svydesign(id = ~1, data = ee8, weights = ~pspwght) ``` --- ### Arvutame tunnuse lkredcc kaalutud keskmise ja usalduspiirid ```r svymean(~lkredcc, design = ee8w, na.rm = TRUE) ``` ``` ## mean SE ## lkredcc 4.7346 0.0559 ``` -- ```r svymean(~lkredcc, design = ee8w, na.rm = TRUE) %>% confint() ``` ``` ## 2.5 % 97.5 % ## lkredcc 4.625124 4.844147 ``` --- ### Arvutame tunnuse lkredcc kaalutud keskmised ja usalduspiirid regioonides ```r svyby(~lkredcc, ~region, design = ee8w, FUN = svymean, na.rm = TRUE, vartype = c("se", "ci")) ``` ``` ## region lkredcc se ci_l ci_u ## Põhja-Eesti Põhja-Eesti 5.006927 0.0831522 4.843952 5.169903 ## Lääne-Eesti Lääne-Eesti 4.603206 0.1554963 4.298439 4.907973 ## Kesk-Eesti Kesk-Eesti 4.509991 0.1950830 4.127636 4.892347 ## Kirde-Eesti Kirde-Eesti 4.150544 0.1833316 3.791220 4.509867 ## Lõuna-Eesti Lõuna-Eesti 4.631699 0.1097541 4.416585 4.846813 ``` --- ### Teeme sõltumatute kogumite t-testi kaalutud andmetega Loome iga regiooni jaoks eraldi andmestikud ```r # install.packages("weights") library(weights) lk_ki <- ee8 %>% filter(region == "Kirde-Eesti") lk_ke <- ee8 %>% filter(region == "Kesk-Eesti") lk_le <- ee8 %>% filter(region == "Lõuna-Eesti") ``` --- ### Teeme sõltumatute kogumite t-testi kaalutud andmetega ```r wtd.t.test(lk_ki$lkredcc, lk_ke$lkredcc, weight = lk_ki$pspwght, weighty = lk_ke$pspwght) ``` ``` ## Warning in wtd.t.test(lk_ki$lkredcc, lk_ke$lkredcc, weight = lk_ki$pspwght, : ## Treating data for x and y separately because they are of different lengths ``` ``` ## $test ## [1] "Two Sample Weighted T-Test (Welch)" ## ## $coefficients ## t.value df p.value ## -1.3606133 363.2974703 0.1744794 ## ## $additional ## Difference Mean.x Mean.y Std. Err ## -0.3594476 4.1505435 4.5099911 0.2641806 ``` --- ### Teeme sõltumatute kogumite t-testi kaalutud andmetega ```r wtd.t.test(lk_ki$lkredcc, lk_le$lkredcc, weight = lk_ki$pspwght, weighty = lk_le$pspwght) ``` ``` ## Warning in wtd.t.test(lk_ki$lkredcc, lk_le$lkredcc, weight = lk_ki$pspwght, : ## Treating data for x and y separately because they are of different lengths ``` ``` ## $test ## [1] "Two Sample Weighted T-Test (Welch)" ## ## $coefficients ## t.value df p.value ## -2.30706395 329.45292273 0.02167097 ## ## $additional ## Difference Mean.x Mean.y Std. Err ## -0.4811557 4.1505435 4.6316992 0.2085576 ``` --- ### `wtd.t.test` ei anna kaalutud keskmiste vahe usalduspiire - Aga need saab ise arvutada standardvea põhjal - Omistame testitulemused uuele objektile `t_test_ki_le` ```r t_test_ki_le <- wtd.t.test(lk_ki$lkredcc, lk_le$lkredcc, weight = lk_ki$pspwght, weighty = lk_le$pspwght) ``` --- ### Arvutame keskmiste vahe usalduspiirid ```r str(t_test_ki_le) ``` ``` ## List of 3 ## $ test : chr "Two Sample Weighted T-Test (Welch)" ## $ coefficients: Named num [1:3] -2.3071 329.4529 0.0217 ## ..- attr(*, "names")= chr [1:3] "t.value" "df" "p.value" ## $ additional : Named num [1:4] -0.481 4.151 4.632 0.209 ## ..- attr(*, "names")= chr [1:4] "Difference" "Mean.x" "Mean.y" "Std. Err" ``` - Alumine usalduspiir usaldusnivool 95% ```r t_test_ki_le$additional[1] - 1.96 * t_test_ki_le$additional[4] ``` ``` ## Difference ## -0.8899285 ``` - Ülemine usalduspiir usaldusnivool 95% ```r t_test_ki_le$additional[1] + 1.96 * t_test_ki_le$additional[4] ``` ``` ## Difference ## -0.07238281 ``` --- ### Harjutus 5 ```r ee8w <- svydesign(id = ~1, data = ee8a, weights = ~pspwght) svyby(~gvsrdcc, ~har, design = ee8w, FUN = svymean, na.rm = TRUE, vartype = c("se", "ci")) ``` ``` ## har gvsrdcc se ci_l ci_u ## Kuni põhiharidus Kuni põhiharidus 4.362288 0.12484189 4.117603 4.606974 ## Kutseharidus Kutseharidus 4.610041 0.08721434 4.439104 4.780978 ## Keskharidus Keskharidus 4.755176 0.12423536 4.511679 4.998673 ## Kõrgharidus Kõrgharidus 5.094212 0.08988150 4.918048 5.270377 ``` ```r gs_uv_w <- svyby(~gvsrdcc, ~har, design = ee8w, FUN = svymean, na.rm = TRUE, vartype = c("se", "ci")) ``` ```r ggplot(gs_uv_w, aes(x = har, y = gvsrdcc)) + geom_point(stat = "identity") + geom_errorbar(aes(ymin = ci_l, ymax = ci_u), width = 0.1) + labs(title = "Vahemikdiagramm kaalutud andmetega") + ylim(4, 5.5) ``` --- ### Harjutus 5 .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] --- class: center, middle, inverse # Osakaalu usaldusvahemik --- ### Osakaalu (protsentnäitaja) usalduspiirid `$$p >= \hat{p} - z_{1 - \frac{\alpha}{2}} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$` `$$p <= \hat{p} + z_{1 - \frac{\alpha}{2}} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$` - `\(p\)` -- osakaal populatsioonis - `\(\hat{p}\)` -- osakaal valimis - `\(z_{1 - \frac{\alpha}{2}}\)` -- standardiseeritud normaaljaotuse `\(1 - \frac{\alpha}{2}\)`-kvantiil, kus `\(\alpha\)` on vea tõenäosuse piir - nt usaldusnivool 90% `\(\alpha = 10\%\)`, vastav `\(z_{1 - \frac{\alpha}{2}} = 1.64\)` - nt usaldusnivool 95% `\(\alpha = 5\%\)`, vastav `\(z_{1 - \frac{\alpha}{2}} = 1.96\)` - nt usaldusnivool 99% `\(\alpha = 1\%\)`, vastav `\(z_{1 - \frac{\alpha}{2}} = 2.58\)` - `\(n\)` -- valimi suurus - Mida lähemal osakaal 50%-le, seda laiem on usaldusvahemik --- ### Osakaalu usaldusvahemiku laius - Kuidas sõltub osakaalu usaldusvahemik osakaalu (protsentnäitaja) suurusest? - Näide valimi n = 1000 kohta ```r p <- seq(0, 1, 0.01) dat <- data.frame(p) dat$veapiir <- 1.96 * sqrt(p * (1 - p) / 1000) knitr::kable(dat, format = 'html') ``` <table> <thead> <tr> <th style="text-align:right;"> p </th> <th style="text-align:right;"> veapiir </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.00 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:right;"> 0.01 </td> <td style="text-align:right;"> 0.0061670 </td> </tr> <tr> <td style="text-align:right;"> 0.02 </td> <td style="text-align:right;"> 0.0086773 </td> </tr> <tr> <td style="text-align:right;"> 0.03 </td> <td style="text-align:right;"> 0.0105731 </td> </tr> <tr> <td style="text-align:right;"> 0.04 </td> <td style="text-align:right;"> 0.0121457 </td> </tr> <tr> <td style="text-align:right;"> 0.05 </td> <td style="text-align:right;"> 0.0135084 </td> </tr> <tr> <td style="text-align:right;"> 0.06 </td> <td style="text-align:right;"> 0.0147196 </td> </tr> <tr> <td style="text-align:right;"> 0.07 </td> <td style="text-align:right;"> 0.0158142 </td> </tr> <tr> <td style="text-align:right;"> 0.08 </td> <td style="text-align:right;"> 0.0168149 </td> </tr> <tr> <td style="text-align:right;"> 0.09 </td> <td style="text-align:right;"> 0.0177377 </td> </tr> <tr> <td style="text-align:right;"> 0.10 </td> <td style="text-align:right;"> 0.0185942 </td> </tr> <tr> <td style="text-align:right;"> 0.11 </td> <td style="text-align:right;"> 0.0193931 </td> </tr> <tr> <td style="text-align:right;"> 0.12 </td> <td style="text-align:right;"> 0.0201413 </td> </tr> <tr> <td style="text-align:right;"> 0.13 </td> <td style="text-align:right;"> 0.0208443 </td> </tr> <tr> <td style="text-align:right;"> 0.14 </td> <td style="text-align:right;"> 0.0215065 </td> </tr> <tr> <td style="text-align:right;"> 0.15 </td> <td style="text-align:right;"> 0.0221315 </td> </tr> <tr> <td style="text-align:right;"> 0.16 </td> <td style="text-align:right;"> 0.0227225 </td> </tr> <tr> <td style="text-align:right;"> 0.17 </td> <td style="text-align:right;"> 0.0232820 </td> </tr> <tr> <td style="text-align:right;"> 0.18 </td> <td style="text-align:right;"> 0.0238122 </td> </tr> <tr> <td style="text-align:right;"> 0.19 </td> <td style="text-align:right;"> 0.0243151 </td> </tr> <tr> <td style="text-align:right;"> 0.20 </td> <td style="text-align:right;"> 0.0247923 </td> </tr> <tr> <td style="text-align:right;"> 0.21 </td> <td style="text-align:right;"> 0.0252452 </td> </tr> <tr> <td style="text-align:right;"> 0.22 </td> <td style="text-align:right;"> 0.0256753 </td> </tr> <tr> <td style="text-align:right;"> 0.23 </td> <td style="text-align:right;"> 0.0260835 </td> </tr> <tr> <td style="text-align:right;"> 0.24 </td> <td style="text-align:right;"> 0.0264709 </td> </tr> <tr> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.0268384 </td> </tr> <tr> <td style="text-align:right;"> 0.26 </td> <td style="text-align:right;"> 0.0271868 </td> </tr> <tr> <td style="text-align:right;"> 0.27 </td> <td style="text-align:right;"> 0.0275169 </td> </tr> <tr> <td style="text-align:right;"> 0.28 </td> <td style="text-align:right;"> 0.0278292 </td> </tr> <tr> <td style="text-align:right;"> 0.29 </td> <td style="text-align:right;"> 0.0281245 </td> </tr> <tr> <td style="text-align:right;"> 0.30 </td> <td style="text-align:right;"> 0.0284031 </td> </tr> <tr> <td style="text-align:right;"> 0.31 </td> <td style="text-align:right;"> 0.0286656 </td> </tr> <tr> <td style="text-align:right;"> 0.32 </td> <td style="text-align:right;"> 0.0289125 </td> </tr> <tr> <td style="text-align:right;"> 0.33 </td> <td style="text-align:right;"> 0.0291441 </td> </tr> <tr> <td style="text-align:right;"> 0.34 </td> <td style="text-align:right;"> 0.0293608 </td> </tr> <tr> <td style="text-align:right;"> 0.35 </td> <td style="text-align:right;"> 0.0295629 </td> </tr> <tr> <td style="text-align:right;"> 0.36 </td> <td style="text-align:right;"> 0.0297507 </td> </tr> <tr> <td style="text-align:right;"> 0.37 </td> <td style="text-align:right;"> 0.0299245 </td> </tr> <tr> <td style="text-align:right;"> 0.38 </td> <td style="text-align:right;"> 0.0300846 </td> </tr> <tr> <td style="text-align:right;"> 0.39 </td> <td style="text-align:right;"> 0.0302311 </td> </tr> <tr> <td style="text-align:right;"> 0.40 </td> <td style="text-align:right;"> 0.0303642 </td> </tr> <tr> <td style="text-align:right;"> 0.41 </td> <td style="text-align:right;"> 0.0304841 </td> </tr> <tr> <td style="text-align:right;"> 0.42 </td> <td style="text-align:right;"> 0.0305911 </td> </tr> <tr> <td style="text-align:right;"> 0.43 </td> <td style="text-align:right;"> 0.0306851 </td> </tr> <tr> <td style="text-align:right;"> 0.44 </td> <td style="text-align:right;"> 0.0307664 </td> </tr> <tr> <td style="text-align:right;"> 0.45 </td> <td style="text-align:right;"> 0.0308350 </td> </tr> <tr> <td style="text-align:right;"> 0.46 </td> <td style="text-align:right;"> 0.0308910 </td> </tr> <tr> <td style="text-align:right;"> 0.47 </td> <td style="text-align:right;"> 0.0309345 </td> </tr> <tr> <td style="text-align:right;"> 0.48 </td> <td style="text-align:right;"> 0.0309655 </td> </tr> <tr> <td style="text-align:right;"> 0.49 </td> <td style="text-align:right;"> 0.0309841 </td> </tr> <tr> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 0.0309903 </td> </tr> <tr> <td style="text-align:right;"> 0.51 </td> <td style="text-align:right;"> 0.0309841 </td> </tr> <tr> <td style="text-align:right;"> 0.52 </td> <td style="text-align:right;"> 0.0309655 </td> </tr> <tr> <td style="text-align:right;"> 0.53 </td> <td style="text-align:right;"> 0.0309345 </td> </tr> <tr> <td style="text-align:right;"> 0.54 </td> <td style="text-align:right;"> 0.0308910 </td> </tr> <tr> <td style="text-align:right;"> 0.55 </td> <td style="text-align:right;"> 0.0308350 </td> </tr> <tr> <td style="text-align:right;"> 0.56 </td> <td style="text-align:right;"> 0.0307664 </td> </tr> <tr> <td style="text-align:right;"> 0.57 </td> <td style="text-align:right;"> 0.0306851 </td> </tr> <tr> <td style="text-align:right;"> 0.58 </td> <td style="text-align:right;"> 0.0305911 </td> </tr> <tr> <td style="text-align:right;"> 0.59 </td> <td style="text-align:right;"> 0.0304841 </td> </tr> <tr> <td style="text-align:right;"> 0.60 </td> <td style="text-align:right;"> 0.0303642 </td> </tr> <tr> <td style="text-align:right;"> 0.61 </td> <td style="text-align:right;"> 0.0302311 </td> </tr> <tr> <td style="text-align:right;"> 0.62 </td> <td style="text-align:right;"> 0.0300846 </td> </tr> <tr> <td style="text-align:right;"> 0.63 </td> <td style="text-align:right;"> 0.0299245 </td> </tr> <tr> <td style="text-align:right;"> 0.64 </td> <td style="text-align:right;"> 0.0297507 </td> </tr> <tr> <td style="text-align:right;"> 0.65 </td> <td style="text-align:right;"> 0.0295629 </td> </tr> <tr> <td style="text-align:right;"> 0.66 </td> <td style="text-align:right;"> 0.0293608 </td> </tr> <tr> <td style="text-align:right;"> 0.67 </td> <td style="text-align:right;"> 0.0291441 </td> </tr> <tr> <td style="text-align:right;"> 0.68 </td> <td style="text-align:right;"> 0.0289125 </td> </tr> <tr> <td style="text-align:right;"> 0.69 </td> <td style="text-align:right;"> 0.0286656 </td> </tr> <tr> <td style="text-align:right;"> 0.70 </td> <td style="text-align:right;"> 0.0284031 </td> </tr> <tr> <td style="text-align:right;"> 0.71 </td> <td style="text-align:right;"> 0.0281245 </td> </tr> <tr> <td style="text-align:right;"> 0.72 </td> <td style="text-align:right;"> 0.0278292 </td> </tr> <tr> <td style="text-align:right;"> 0.73 </td> <td style="text-align:right;"> 0.0275169 </td> </tr> <tr> <td style="text-align:right;"> 0.74 </td> <td style="text-align:right;"> 0.0271868 </td> </tr> <tr> <td style="text-align:right;"> 0.75 </td> <td style="text-align:right;"> 0.0268384 </td> </tr> <tr> <td style="text-align:right;"> 0.76 </td> <td style="text-align:right;"> 0.0264709 </td> </tr> <tr> <td style="text-align:right;"> 0.77 </td> <td style="text-align:right;"> 0.0260835 </td> </tr> <tr> <td style="text-align:right;"> 0.78 </td> <td style="text-align:right;"> 0.0256753 </td> </tr> <tr> <td style="text-align:right;"> 0.79 </td> <td style="text-align:right;"> 0.0252452 </td> </tr> <tr> <td style="text-align:right;"> 0.80 </td> <td style="text-align:right;"> 0.0247923 </td> </tr> <tr> <td style="text-align:right;"> 0.81 </td> <td style="text-align:right;"> 0.0243151 </td> </tr> <tr> <td style="text-align:right;"> 0.82 </td> <td style="text-align:right;"> 0.0238122 </td> </tr> <tr> <td style="text-align:right;"> 0.83 </td> <td style="text-align:right;"> 0.0232820 </td> </tr> <tr> <td style="text-align:right;"> 0.84 </td> <td style="text-align:right;"> 0.0227225 </td> </tr> <tr> <td style="text-align:right;"> 0.85 </td> <td style="text-align:right;"> 0.0221315 </td> </tr> <tr> <td style="text-align:right;"> 0.86 </td> <td style="text-align:right;"> 0.0215065 </td> </tr> <tr> <td style="text-align:right;"> 0.87 </td> <td style="text-align:right;"> 0.0208443 </td> </tr> <tr> <td style="text-align:right;"> 0.88 </td> <td style="text-align:right;"> 0.0201413 </td> </tr> <tr> <td style="text-align:right;"> 0.89 </td> <td style="text-align:right;"> 0.0193931 </td> </tr> <tr> <td style="text-align:right;"> 0.90 </td> <td style="text-align:right;"> 0.0185942 </td> </tr> <tr> <td style="text-align:right;"> 0.91 </td> <td style="text-align:right;"> 0.0177377 </td> </tr> <tr> <td style="text-align:right;"> 0.92 </td> <td style="text-align:right;"> 0.0168149 </td> </tr> <tr> <td style="text-align:right;"> 0.93 </td> <td style="text-align:right;"> 0.0158142 </td> </tr> <tr> <td style="text-align:right;"> 0.94 </td> <td style="text-align:right;"> 0.0147196 </td> </tr> <tr> <td style="text-align:right;"> 0.95 </td> <td style="text-align:right;"> 0.0135084 </td> </tr> <tr> <td style="text-align:right;"> 0.96 </td> <td style="text-align:right;"> 0.0121457 </td> </tr> <tr> <td style="text-align:right;"> 0.97 </td> <td style="text-align:right;"> 0.0105731 </td> </tr> <tr> <td style="text-align:right;"> 0.98 </td> <td style="text-align:right;"> 0.0086773 </td> </tr> <tr> <td style="text-align:right;"> 0.99 </td> <td style="text-align:right;"> 0.0061670 </td> </tr> <tr> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> </tbody> </table> --- ### Osakaalu usaldusvahemiku laius ```r joonis <- ggplot(dat, aes(p, veapiir)) + geom_path() ``` --- ### Osakaalu usaldusvahemiku laius ```r plotly::ggplotly(joonis) ```
--- ### Veapiir - Pool usaldusvahemikust ehk `\(z_{1 - \frac{\alpha}{2}} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)` = veapiir - Korrektne kasutada ainult tõenäosusliku valimiga uuringu puhul - See eeldus kehtib igasuguse järeldava analüüsi kohta --- ### Osakaalude usaldusvahemike arvutamine - Kasutame kahe erakondade reitingu uuringu andmeid (aastast 2020): - Turu-uuringute AS, 12.-24. aug, n = 1003 (https://www.err.ee/1127504/turu-uuringud-reformierakonna-toetus-jai-augustis-keskerakonnale-alla) - Turu-uuringute AS, 7.-17. sept, n = 1010, https://www.err.ee/1138657/reitingud-reformierakond-tousis-taas-populaarseimaks-parteiks). - Näide on selles mõttes meie jaoks kehv, et reitingud on ümardatud täisarvuni, aga käsitleme siis neid harjutuse korras kui uuringuga saadud tulemusi --- ### Osakaalude usaldusvahemike arvutamine ```r erakond <- rep(c("RE", "KE", "EKRE", "SDE", "Eesti 200", "Isamaa", "Rohelised", "Tulevik", "Muu"), 2) reiting <- data.frame(erakond) reiting$poolehoid <- c(0.23, 0.26, 0.20, 0.10, 0.09, 0.05, 0.02, 0.01, 0.04, 0.27, 0.24, 0.16, 0.11, 0.09, 0.05, 0.03, 0.03, 0.02) reiting$kuu <- c(rep("August", 9), rep("September", 9)) reiting ``` ``` ## erakond poolehoid kuu ## 1 RE 0.23 August ## 2 KE 0.26 August ## 3 EKRE 0.20 August ## 4 SDE 0.10 August ## 5 Eesti 200 0.09 August ## 6 Isamaa 0.05 August ## 7 Rohelised 0.02 August ## 8 Tulevik 0.01 August ## 9 Muu 0.04 August ## 10 RE 0.27 September ## 11 KE 0.24 September ## 12 EKRE 0.16 September ## 13 SDE 0.11 September ## 14 Eesti 200 0.09 September ## 15 Isamaa 0.05 September ## 16 Rohelised 0.03 September ## 17 Tulevik 0.03 September ## 18 Muu 0.02 September ``` --- ### Vaatame tulemusi ka joonisel ```r ggplot(reiting, aes(x = erakond, y = poolehoid, fill = kuu)) + geom_bar(stat = "identity", position = "dodge", width = 0.6) + scale_x_discrete(limits = reiting$erakond[1:9]) + labs(x = "Erakond", y = "Osakaal") + theme_light() ``` --- ### Vaatame tulemusi ka joonisel <img src="data:image/png;base64,#km_04_usaldusvahemik_kood_files/figure-html/unnamed-chunk-41-1.png" width="100%" /> --- ### Osakaalude usaldusvahemike arvutamine - Uudise pealkiri oli, et RE tõusis taas populaarseimaks parteiks - Kas saame siiski uuringu andmetel väita, et muutus RE ja KE poolehoiuprotsentides esineb ka populatsioonis (eeldades, et täisarvulised protsendid on täpsed)? - Arvutame usalduspiirid ja lisame joonisele --- ### Osakaalude usaldusvahemike arvutamine - Kasutame usaldusvahemike arvutamiseks funktsiooni `MultinomCI`, mis eeldab indiviidide arve, mitte protsentjaotust - Seega arvutame kõigepealt reitingute põhjal vastajate arvud (st kui palju vastajaid mingit erakonda eelistavad) ```r reiting <- reiting %>% mutate(poolehoid_n = ifelse(kuu == "August", poolehoid * 1003, poolehoid * 1010) %>% round()) ``` --- ### Osakaalude usaldusvahemike arvutamine - Arvutame usalduspiirid, kasutades funktsiooni `MultinomCI` paketist `DescTools` - Reitingute usalduspiirid on vaja arvutada augusti ja septembri kohta eraldi, sest eri kuude reitingud moodustavad omaette tervikud (ühe kuu reitingud moodustavad kokku 100%) ```r # install.packages("DescTools") library(DescTools) poolehoid_aug <- reiting %>% filter(kuu == "August") %>% pull(poolehoid_n) %>% MultinomCI(conf.level = 0.95, method = "wilson") poolehoid_sept <- reiting %>% filter(kuu == "September") %>% pull(poolehoid_n) %>% MultinomCI(conf.level = 0.95, method = "wilson") ``` --- ### Paneme reitingute usalduspiirid kokku üheks tabeliks objekti `poolehoid_ci` ```r poolehoid_ci <- rbind(poolehoid_aug, poolehoid_sept) %>% as.data.frame() poolehoid_ci ``` ``` ## est lwr.ci upr.ci ## 1 0.23030907 0.205311231 0.25736485 ## 2 0.26021934 0.234017535 0.28825085 ## 3 0.20039880 0.176789596 0.22629418 ## 4 0.09970090 0.082659353 0.11979701 ## 5 0.08973081 0.073573675 0.10901859 ## 6 0.04985045 0.038015603 0.06512026 ## 7 0.01994018 0.012944740 0.03059882 ## 8 0.00997009 0.005424455 0.01825500 ## 9 0.03988036 0.029422300 0.05384946 ## 10 0.27056492 0.244061881 0.29880833 ## 11 0.23984143 0.214513526 0.26714276 ## 12 0.16055500 0.139196600 0.18448827 ## 13 0.11000991 0.092162180 0.13081592 ## 14 0.09018831 0.074033075 0.10945217 ## 15 0.04955401 0.037788342 0.06473655 ## 16 0.02973241 0.020904886 0.04212715 ## 17 0.02973241 0.020904886 0.04212715 ## 18 0.01982161 0.012867550 0.03041806 ``` --- ### Lisame algsest andmestikust ka erakondade nimed ja kuud ```r poolehoid_ci$erakond <- reiting$erakond poolehoid_ci$kuu <- reiting$kuu poolehoid_ci ``` ``` ## est lwr.ci upr.ci erakond kuu ## 1 0.23030907 0.205311231 0.25736485 RE August ## 2 0.26021934 0.234017535 0.28825085 KE August ## 3 0.20039880 0.176789596 0.22629418 EKRE August ## 4 0.09970090 0.082659353 0.11979701 SDE August ## 5 0.08973081 0.073573675 0.10901859 Eesti 200 August ## 6 0.04985045 0.038015603 0.06512026 Isamaa August ## 7 0.01994018 0.012944740 0.03059882 Rohelised August ## 8 0.00997009 0.005424455 0.01825500 Tulevik August ## 9 0.03988036 0.029422300 0.05384946 Muu August ## 10 0.27056492 0.244061881 0.29880833 RE September ## 11 0.23984143 0.214513526 0.26714276 KE September ## 12 0.16055500 0.139196600 0.18448827 EKRE September ## 13 0.11000991 0.092162180 0.13081592 SDE September ## 14 0.09018831 0.074033075 0.10945217 Eesti 200 September ## 15 0.04955401 0.037788342 0.06473655 Isamaa September ## 16 0.02973241 0.020904886 0.04212715 Rohelised September ## 17 0.02973241 0.020904886 0.04212715 Tulevik September ## 18 0.01982161 0.012867550 0.03041806 Muu September ``` --- ### Paneme reitingud koos usalduspiiridega joonisele ```r library(scales) ggplot(poolehoid_ci, aes(x = erakond, y = est, fill = kuu)) + geom_bar(stat = "identity", position = "dodge", width = 0.9) + geom_errorbar(aes(ymin = lwr.ci, ymax = upr.ci), position = position_dodge2(width = 0.6, padding = 0.6), color = "black") + scale_y_continuous(labels = scales::percent_format(accuracy = 1L)) + scale_x_discrete(limits = poolehoid_ci$erakond[1:9]) + labs(x = "Erakond", y = "Osakaal") + theme_light() ``` --- ### Paneme reitingud koos usalduspiiridega joonisele <img src="data:image/png;base64,#km_04_usaldusvahemik_kood_files/figure-html/unnamed-chunk-46-1.png" width="100%" /> --- ### Usaldusvahemike võrdlemine - Usaldusvahemikud kattuvad omajagu - Pelgalt usaldusevahemike kattumise vaatlemisel võiks öelda, et RE reiting võib tõepoolest olla tõusnud, aga võib ka olla, et valimite juhuslikkuse tõttu on reitingud olnud kahel kuul samad või üsnagi sarnased - Sama saaks öelda ka KE kohta - Siiski, sarnaselt keskmiste võrldemisega tuleks siingi võtta arvesse osakaalude erinevuse usaldusvahemikku (st RE reitingumuutuse usaldusvahemikku ja KE reitingumuutuse usaldusvahemikku) - Kuidas osakaalude erinevuse usalduspiire arvutada? --- ### Osakaalude erinevuse usalduspiirid - Kuidas osakaalude vahe usalduspiire arvutada? - Erinev arvutusvalem sõltuvalt sellest, kas osakaalud on - kahest erinevast jaotusest (kahest teineteisest sõltumatust kogumist) - või samast jaotusest (osakaalud on osa ühest tervikust) - Nt erakondade reitingute uurimise puhul: - kas uurime erinevust ühe erakonna reitingutes kahel erineval ajahetkel - või erakondade reitingute erinevusi samal ajahetkel - Esimene juhtum: kahe osakaalu *z*-test (sõltumatute kogumite *z*-test) --- ### Kahe osakaalu *z*-test `$$Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1}+\frac{1}{n_2})}}$$` - `\(\hat{p}_1\)` -- esimese grupi osakaal esimesest valimist `\((n_1)\)` - `\(\hat{p}_2\)` -- teise grupi osakaal teisest valimist `\((n_2)\)` - `\(\hat{p}\)` -- mõlema grupi osakaal mõlema valimi peale kokku: `\(\frac{\hat{p}_1 + \hat{p}_2}{n_1+n_2}\)` - `\(n_1\)` -- esimese valimi suurus - `\(n_2\)` -- teise valimi suurus --- ### Osakaalude erinevus kahe osakaalu *z*-testi põhjal `$$Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1}+\frac{1}{n_2})}}$$` - Kasutame funktsiooni `prop.test`, millele on vaja eraldi argumentidena ette anda vektor indiviidide arvudega kummaski grupis `\(c(\hat{p}_1 n_1, \hat{p}_2 n_2)\)` ja valimimahtude vektor `\(c(n_1, n_2)\)`. - Osakaalude vektor: ```r ind_arv_gruppides <- reiting %>% filter(erakond == "RE") %>% pull(poolehoid_n) ind_arv_gruppides ``` ``` ## [1] 231 273 ``` --- ### Osakaalude erinevus kahe osakaalu *z*-testi põhjal `$$Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1}+\frac{1}{n_2})}}$$` Arvutame, kui palju vastajaid oli augustis ja septembris ```r n1 <- reiting %>% filter(kuu == "August") %>% pull(poolehoid_n) %>% sum() n1 ``` ``` ## [1] 1003 ``` ```r n2 <- reiting %>% filter(kuu == "September") %>% pull(poolehoid_n) %>% sum() n2 ``` ``` ## [1] 1009 ``` --- ### Osakaalude erinevus kahe osakaalu *z*-testi põhjal `$$Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(\frac{1}{n_1}+\frac{1}{n_2})}}$$` ```r prop.test(ind_arv_gruppides, c(n1, n2), correct = FALSE) ``` ``` ## ## 2-sample test for equality of proportions without continuity correction ## ## data: ind_arv_gruppides out of c(n1, n2) ## X-squared = 4.3416, df = 1, p-value = 0.03719 ## alternative hypothesis: two.sided ## 95 percent confidence interval: ## -0.078075315 -0.002436371 ## sample estimates: ## prop 1 prop 2 ## 0.2303091 0.2705649 ``` --- ### Osakaalude erinevus kahe osakaalu *z*-testi põhjal - Tulemus näitab, et usaldusnivool 95% saab RE reitingumuutust väita küll - usaldusvahemik ei sisalda nulli, kuigi, tõsi küll, ülemine ots ei ole nullist kuigi kaugel - sama näitab teststatistiku olulisuse tõenäosus, mis on väiksem 0,05-st (usaldusnivoole 95% vastavast olulisuse nivoost) - `prop.test` annab meile küll hii-ruut-statistiku, mitte *Z*, aga olulisuse tõenäosus on sama --- ### Harjutus 6 - Kas samal ajal kui RE reiting augusti ja septembri võrdluses tõusis, saame väita, et KE reiting langes? --- ### Sama jaotuse osakaalude erinevuse usalduspiirid - Kuidas arvutada osakaalude erinevuse usalduspiire, kui osakaalud on pärit samast protsentjaotusest? - Nt reitingute erinevus samal ajahetkel - Osakaalude erinevuse usalduspiiride valem (Scott ja Seber 1983): `$$p_1 - p_2 >= \hat{p}_1 - \hat{p}_2 - z_{1 - \frac{\alpha}{2}}\sqrt{\frac{\hat{p}_1+\hat{p}_2-(\hat{p}_1-\hat{p}_2)^2}{n}}$$` `$$p_1 - p_2 <= \hat{p}_1 - \hat{p}_2 + z_{1 - \frac{\alpha}{2}}\sqrt{\frac{\hat{p}_1+\hat{p}_2-(\hat{p}_1-\hat{p}_2)^2}{n}}$$` - `\(p_1, p_2\)` -- esimese ja teise grupi osakaal populatsioonis - `\(\hat{p}_1, \hat{p}_2\)` -- esimese ja teise grupi osakaal valimis - `\(z_{1 - \frac{\alpha}{2}}\)` -- standardiseeritud normaaljaotuse `\(1 - \frac{\alpha}{2}\)`-kvantiil - `\(n\)` -- valimi suurus --- ### Sama jaotuse osakaalude erinevuse usalduspiirid - Uudise pealkiri oli, et RE tõusis septembris taas populaarseimaks parteiks - Kas saame siiski uuringu andmetel öelda, et RE ja KE septembrireitingud erinevad ka populatsioonis (eeldades, et täisarvulised protsendid on täpsed ja et uuring põhineb lihtsal juhuvalimil)? - Täpsemalt, kas usaldusnivool 95% on alust väita RE ja KE septembrireitingute erinevust? - Arvutame reitingute vahe usalduspiirid - Selle jaoks on vaja RE ja KE septembrikuu reitinguid ja kõigi vastajate arvu septembris (viimane on eelnevalt arvutatud objektis `n2`). --- ### RE ja KE septembrikuu reitingud - RE septembrikuu reiting ```r p1 <- reiting %>% filter(erakond == "RE" & kuu == "September") %>% pull(poolehoid) ``` - KE septembrikuu reiting ```r p2 <- reiting %>% filter(erakond == "KE" & kuu == "September") %>% pull(poolehoid) ``` --- ### RE ja KE septembrireitingute erinevuse usalduspiirid - Reitingute erinevus valimis ```r p1 - p2 ``` ``` ## [1] 0.03 ``` - Alumine usalduspiir: `\(p_1 - p_2 >= \hat{p}_1 - \hat{p}_2 - z_{1 - \frac{\alpha}{2}}\sqrt{\frac{\hat{p}_1+\hat{p}_2-(\hat{p}_1-\hat{p}_2)^2}{n}}\)` ```r p1 - p2 - 1.96 * sqrt((p1 + p2 - (p1 - p2)^2) / n2) ``` ``` ## [1] -0.01402628 ``` - Ülemine usalduspiir: `\(p_1 - p_2 <= \hat{p}_1 - \hat{p}_2 + z_{1 - \frac{\alpha}{2}}\sqrt{\frac{\hat{p}_1+\hat{p}_2-(\hat{p}_1-\hat{p}_2)^2}{n}}\)` ```r p1 - p2 + 1.96 * sqrt((p1 + p2 - (p1 - p2)^2) / n2) ``` ``` ## [1] 0.07402628 ``` - Mida nendest usalduspiiridest järeldada saame? (NB! Andmed on 2020. aasta septembri kohta) --- ### Harjutus 7 - KE ja RE reitingute erinevust 2020 septembris väita ei saanud. Kas 2020 augustis oli RE edumaa siiski statistilisest veast suurem, st sai väita RE kõrgemat reitingut KE-st?